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ABSTRACT 

We use a modified version of the halo-based group finder developed by Yang et al. to select galaxy 
groups from the Sloan Digital Sky Survey (SDSS DR4). In the first step, a combination of two methods 
is used to identify the centers of potential groups and to estimate their characteristic luminosity. Using 
an iterative approach, the adaptive group finder then uses the average mass-to-light ratios of groups, 
obtained from the previous iteration, to assign a tentative mass to each group. This mass is then 
used to estimate the size and velocity dispersion of the underlying halo that hosts the group, which 
in turn is used to determine group membership in redshift space. Finally, each individual group is 
assigned two different halo masses: one based on its characteristic luminosity, and the other based 
on its characteristic stellar mass. Applying the group finder to the SDSS DR4, we obtain 301237 
groups in a broad dynamic range, including systems of isolated galaxies. We use detailed mock galaxy 
catalogues constructed for the SDSS DR4 to test the performance of our group finder in terms of 
completeness of true members, contamination by interlopers, and accuracy of the assigned masses. 
This paper is the first in a series and focuses on the selection procedure, tests of the reliability of the 
group finder, and the basic properties of the group catalogue (e.g. the mass-to-light ratios, the halo 
mass to stellar mass ratios, etc.). The group catalogues including the membership of the groups are 
available at these linksQ. 

Subject headings: dark matter - large-scale structure of the universe - galaxies: halos - methods: 
statistical 



1. INTRODUCTION 

Galaxies are thought to form and reside in extended 
cold dark matter haloes. One of the ultimate challenges 
in astrophysics is therefore to obtain a detailed under- 
standing of how galaxies with different physical proper- 
ties occupy dark matter haloes of different mass. This re- 
lationship not only conveys important information about 
how different galaxies form and evolve in different dark 
matter haloes, but it also provides the necessary basis for 
translating the observed distribution of galaxies into the 
large-scale distribution of matter throughout the Uni- 
verse. 

Theoretically, the relationship between galaxies and 
dark matter haloes can be studied using numerical sim- 
ulations (e.g., Katz, Weinberg & Hernquist 1996; Pearce 
et al. 2000; Springel 2005; Springel et al. 2005) or semi- 
analytical models (e.g. White & Frenk 1991; Kauffmann 
et al. 1993, 2004; Somerville & Primack 1999; Cole et 
al. 2000; van den Bosch 2002; Kang et al. 2005; Croton 
et al. 2006). Both of these techniques try to model the 
process of galaxy formation ab initio. However, since our 
understanding of the various physical processes involved 
is still relatively poor, the relations between the proper- 
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ties of galaxies and their dark matter haloes predicted by 
these simulations and semi-analytical models still need to 
be tested against observations. 

More recently, the halo occupation model has opened 
another avenue to probe the galaxy-dark matter connec- 
tion (e.g. Jing, Mo & Borner 1998; Peacock & Smith 
2000; Berfind & Weinberg 2002; Cooray & Sheth 2002; 
Scranton 2003; Yang, Mo & van den Bosch 2003; van den 
Bosch, Yang & Mo 2003; Yan, Madgwick & White 2003; 
Tinker et al. 2005; Zheng et al. 2005; Cooray 2006; Vale 
& Ostriker 2006; van den Bosch et al. 2007). This tech- 
nique uses the observed galaxy luminosity function and 
two-point correlation functions to constrain the average 
number of galaxies of given properties that occupy a dark 
matter halo of given mass. Although this method has the 
advantage that it can typically yield much better fits to 
the data than the semi-analytical models or numerical 
simulations, one typically needs to assume a somewhat 
ad-hoc functional form to describe the halo occupation 
model. 

A more direct way of studying the galaxy-halo connec- 
tion is by using galaxy groups, provided that these are 
defined as sets of galaxies that reside in the same dark 
matter halo^. With a well-defined galaxy group cata- 
logue, one can not only study the properties of galax- 
ies as function of their group properties (e.g. Yang et 
al. 2005c, d; CoUister & Lahav 2005; van den Bosch et 
al. 2005; Robotham 2006; Zandivarez et al. 2006; Wein- 
mann et al. 2006a, b) but one can also probe how dark 
matter haloes trace the large-scale structure of the uni- 
verse (e.g. Yang et al. 2005b, 2006; Coil et al. 2006; 
Berlind et al. 2007). During the past two decades, numer- 

^ In this paper, we refer to a system of galaxies as a group 
regardless of its richness, including isolated galaxies (i.e., groups 
with a single member) and rich clusters of galaxies. 
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ous group catalogues have been constructed from various 
galaxy redshift surveys, most noticeably the CfA red- 
shift survey (e.g. Geller & Huchra 1983), the Las Cam- 
panas Redshift Survey (e.g. Tucker et al. 2000), the 2- 
degree Field Galaxy Redshift Survey (hereafter 2dFGRS; 
Merchan & Zandivarez 2002; Eke et al. 2004, Yang et 
al. 2005a; Tago et al. 2006; Einasto et al. 2007), the high- 
redshift DEEP2 survey (Gerke et al. 2005), and the Two 
Micron AU Sky Redshift Survey (Crook et al. 2007). Var- 
ious group catalogues have also been constructed from 
the redshift samples selected from the on-going Sloan 
Digftal Sky Survey (hereafter SDSS): Goto (2005) and 
Berlind et al. (2006) used a friends-of-friends (EOF) al- 
gorithm to identify groups in the SDSS Data Release 2 
(DR2; Abazajian et al. 2004), Miller et al. (2005) used 
the C4 algorithm to find clusters in the SDSS DR2, Wein- 
mann et al. (2006a) used the halo-based group finder 
of Yang et al. (2005a) to identify groups in the New 
York University Value-Added Galaxy Catalogue (NYU- 
VAGC) of Blanton et al. (2005) which is also based on 
the SDSS DR2, and Merchan & Zandivarez (2005) used 
a EOF algorithm to identify groups in the SDSS DR3 
(Abazajian et al. 2005). Group catalogues have also been 
constructed from the SDSS photometric data. Goto et 
al. (2002) developed a cut-and-enhance method and ap- 
plied it to the early SDSS commissioning data. Bahcall 
et al. (2003) compared the properties of groups selected 
from the early SDSS commissioning data with two dif- 
ferent selection methods, a hybrid matched filter method 
(Kim 2002) and a "maxBCG" method developed by An- 
nis et al. (1999). Lee (2004) identified compact groups 
in the SDSS Early Data Release (EDR; Stoughton et 
al. 2002). More recently, Koester et al. (2007) used the 
"maxBCG" method to assemble a large photometrically 
selected galaxy group catalogue from the SDSS with a 
sky-coverage of ~ 7500deg^ . Photometric catalogues also 
exist outside the SDSS (e.g. Gonzalez et al. 2001; Glad- 
ders & Yee 2005). 

In a recent paper, Yang et al. (2005a) developed a halo- 
based group finder that is optimized for grouping galaxies 
that reside in the same dark matter halo. Using mock 
galaxy redshift surveys constructed from the conditional 
luminosity function model (see Yang et al. 2004), they 
found that this group finder is very successful in asso- 
ciating galaxies according to their common dark matter 
haloes. In particular, the group finder performs also reli- 
ably for poor systems, including isolated galaxies in small 
mass haloes. This makes this halo-based group finder 
ideally suited to study the relation between galaxies and 
dark matter haloes over a wide dynamic range in halo 
masses. Thus far, the halo-based group finder has been 
apphed to both the 2dFGRS (Yang et al. 2005a) and to 
the SDSS DR2 (Weinmann et al. 2006a). In this paper, 
we apply a slightly modified and improved version to the 
NYU-VAGC based on the SDSS DR4. As the first in a se- 
ries, this paper focuses on the selection process and the 
basic properties of the group catalogue. More detailed 
analyses of the group properties and the implications for 
halo occupation statistics and galaxy formation will be 
presented in forthcoming papers. 

This paper is organized as follows. Section [2] gives a 
brief description of the SDSS data used in this paper. In 
Section[3]we describe the halo-based group finder and the 



methods to assign halo masses to the groups. In Section 
m we present the group catalogue based on the SDSS 
DR4, and study some of its basic properties. Finally, 
we summarize our results in Section [5j Unless stated 
otherwise, we adopt a ACDM cosmology with parame- 
ters that are consistent with the three-year data release 
of the WMAP mission (hereafter WMAP3 cosmology): 
r^ni = 0.238, nA = 0.762, = 0.042, n = 0.951, 
h = iJo/(100 kms-^Mpc"^) = 0.73 and erg = 0.75 
(Spergel et al. 2007). 

2. GALAXY SAMPLES 

The data used in this paper is taken from the Sloan 
Digftal Sky Survey (SDSS; York et al. 2000), a joint five- 
passband {u,g,r,i,z) imaging and medium-resolution 
[R ~ 1800) spectroscopic survey. More specifically 
we make use of the New York University Value-Added 
Galaxy Catalogue (NYU-VAGC; see Blanton et al. 2005), 
which is based on SDSS DR4 (Adelman-McCarthy et 
al. 2006) but includes a set of significant improvements 
over the original pipelines. From this catalogue we select 
all galaxies in the Main Galaxy Sample with redshifts in 
the range 0.01 < z < 0.20 and with a redshift complete- 
ness C > 0.7 (about 4% of the galaxies have C < 0.7). 
This leaves a grand total of 362356 galaxies with reliable 
r-band magnitudes and with measured redshifts from the 
SDSS. We will refer to this sample of galaxies as Sample 
I. 

In addition, there are 7091 galaxies with 0.01 < z < 
0.20 in the NYU-VAGC which have redshifts from alter- 
native sources: from the 2dFGRS (CoUess et al. 2001), 
from the PSCz (Saunders et al. 2000) or from the RC3 
(de Vaucouleurs et al. 1991) ^. Including these galax- 
ies results in Sample II, with a total of 369447 galax- 
ies. As an illustration. Fig. [1] shows the sky coverage 
(~ 4514deg^) of all galaxies in Sample II in Galactic co- 
ordinates, overlaid on the galactic extinction contours of 
Schlegel, Finkbeiner & Davis (1998). 

The two samples described above suffer from incom- 
pleteness due to fiber collisions. No two fibers on the 
same SDSS plate can be closer than 55 arcsec. Although 
this fiber collision constraint is partially alleviated by the 
fact that neighboring plates have overlap regions, ~ 7 
percent of all galaxies eligible for spectroscopy do not 
have a measured redshift. Hereafter we refer to these 
galaxies as 'fiber-collision' galaxies. Since fiber collisions 
are more frequent in regions of high (projected) density, 
they are more likely to occur in richer groups, thus caus- 
ing a systematic bias that may need to be accounted for. 
A simple method of doing so is to assign a galaxy which 
lacks an observed redshift due to fiber collisions the red- 
shift of the galaxy with which it collided. As shown in Ze- 
havi et al. (2002), roughly 60 percent of the fiber-collision 
galaxies have a redshift within 500 kms~^ of their near- 
est neighbor, and for these cases the above procedure is 
more than appropriate. However, there are also cases in 
which the fiber-collision galaxy has a true redshift that 
is very different from that of its nearest neighbor. If 
the fiber-collision galaxy is assigned a redshift that is 
too large, its implied luminosity will also be too large, 
and can in fact become excessively large. This in turn 
can have dramatic consequences for our group finder. To 

See Blanton et al. (2005) for details. 
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Fig. 1. — The sky coverage of the SDSS DR4 galaxies in sample II, overlaid on the galactic extinction contours of Schlegel, Finkbeiner 
& Davis (1998). Note that the SDSS probes regions of low galactic extinction. 



limit the impact of these catastrophic failures we remove 
the ^1.0 percent of all fiber-collision galaxies that have 
an implied absolute magnitude of "-^Mr — 5 log/i < —22.5 
(see eq. [1] below). In our redshift interval, there are a 
total of 38672 galaxies with an assigned redshift and with 
^■^Mr — 5\ogh > —22.5. Including these galaxies results 
in Sample III, with a total of 408119 galaxies. 

In what follows, we use Sample II as our main sample 
for selecting galaxy groups. For completeness, we will 
also apply our group finder over samples I and III, and 
we will occasionally compare results based on all three 
group catalogues. 

2.1. Magnitudes and stellar masses 

For each galaxy we compute the absolute magnitude 
in bandpass Q using 

°-iMQ-51og/i = ruQ + AniQ -BM{z) - Kq - Eq (1) 

Here DM{z) = 5 log [i:)L/( /i^^Mpc)] + 25 is the bolo- 
metric distance modulus calculated from the luminosity 
distance using a WMAP3 cosmology with f2,„ — 0.24 
and = 0.76. Aniq is the latest zero-point correc- 
tion for the apparent magnitudes, which converts the 
SDSS magnitudes to the AB system, and for which we 
adopt Atoq = (-0.036,-1-0.012,-1-0.010,-1-0.028,-1-0.040) 
for Q = (u, g,r,i, z) (Michael Blanton, private communi- 
cation). All absolute magnitudes are K + E corrected to 
z = 0.1. For the K corrections we use the latest version 
of 'Kcorrect' (v4) described in Blanton et al. (2003a; see 
also Blanton & Roweis 2007), which we apply to all galax- 
ies that have meaningful magnitudes and meaningful red- 
shifts, including those that have redshifts from alterna- 
tive sources and those that have been assigned the red- 
shift of their nearest neighbor. Finally, the evolution cor- 
rections to z = 0.1 are computed using Eq = Aq{z—0.1), 
with Aq ^ (-4.22,-2.04,-1.62,-1.61,-0.76) for Q = 
{u, g,r,i, z) (see Blanton et al. 2003a). Note that these 



evolution corrections imply that galaxies were brighter 
in the past (at higher redshifts). 

In addition to the absolute magnitudes, we also com- 
pute for each galaxy its stellar mass, Af*. Using the 
relation between stellar mass-to-light ratio and color of 
Bell et al. (2003), we obtain 



log 



4^ = -0.306+ 1.097 P °(g-r)] -0.1 

-0.4( ° °M^ - 5 log /i - 4.64) , (2) 

Here '^"(5 — r) and '^"Mr — 51og/i are the {g — r) color 
and r-band magnitude K + E corrected to z = 0.0, 4.64 
is the r-band magnitude of the Sun in the AB system 
(Blanton & Roweis 2007), and the —0.10 term effectively 
implies that we adopt a Kroupa (2001) IMF (Borch et 
al. 2006). 

For a small fraction of all galaxies, the g — r color that 
results from the photometric SDSS pipeline is unreliable. 
These galaxies typically have g — r colors that are clearly 
unrealistic (they are catastrophic outliers in the color- 
magnitude distribution). If this is not accounted for, 
equation ^ assigns these galaxies stellar masses that 
are unrealistically high or low, which can have a dra- 
matic impact on our group finder (which assigns masses 
to the groups based on their characteristic stellar mass; 
see Section 13.51 below) . To take account of these outliers 
we proceed as follows. As shown by Baldry et al. (2004) 
the distribution of {g — r) colors at a given r-band mag- 
nitude can be well approximated by a bi-Gaussian func- 
tion, representing the red sequence and the blue cloud. 
Following Li et al. (2006) we therefore fit bi-Gaussian 
functions to the distribution of ° ''(g — r) for a total of 
118 bins in ° °Mr - 5 log ft,. As shown in Li et al. (2006) 
these fits accurately capture the distribution of galaxies 
in the color-magnitude plane. For any galaxy that falls 
outside the 3-cr ranges from the mean color-magnitude 
relations of both the red sequence and the blue cloud 
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2% of all galaxies in Sample III), we compute its stel- 
lar mass using the mean color of the red sequence (when 
the galaxy is too red) or the blue cloud (when the galaxy 
is too blue) . Detailed tests have shown that this prevents 
any problems with catastrophic outliers. 

3. THE CONSTRUCTION OF THE GROUP 
CATALOGUE 

3.1. The Group Finder 

The group finder adopted here is similar to that devel- 
oped in Yang et al. (2005a). The strength of this group 
finder, hereafter referred to as the halo-based group 
finder, is that it is iterative and based on an adaptive 
filter modeled after the general properties of dark matter 
haloes. In addition, unlike the traditional EOF method, 
this group finder can also identify groups with only a 
single member, which allows us to sample a wider dy- 
namic range in group masses. Note that various masses 
are used in our group finder and in the presentation. In 
order to avoid confusion, we list in Table [T] the various 
masses that are used along with their definitions. 

The halo-based group finder consists of the following 
main steps: 

Step 1: Find potential group centers. We use 

a combination of two different methods to identify the 
centers (and members) of potential groups in redshift 
space. First we use the traditional FOE algorithm (e.g. 
Davis et al. 1985) with very small linking lengths in red- 
shift space to assign galaxies into tentative groups that 
may represent the central parts of groups. The linking 
lengths adopted are £z = 0.3 along the line of sight, and 
£p = 0.05 in the transverse direction, both in units of the 
mean separation of galaxies at the redshift in question. 
The geometrical, luminosity-weighted centers of all the 
EOF groups thus identified with two members or more 
are considered as the centers of potential groups. Next, 
for all galaxies not yet linked to these EOF groups, we 
treat them also as tentative centers of potential groups. 

Step 2: Determine the characteristic luminosity 
of each tentative group. In order to be able to mean- 
ingfully compare different groups, we define the group's 
characteristic luminosity, L19.5, defined as the combined 
luminosity of all group members with ^'^M^ — 51og/i < 
— 19.5 (here again, all absolute magnitudes are K + E 
corrected to z = 0.1). For groups with redshifts z < 0.09 
all galaxies with °-^Mj. — 51og/i < —19.5 make the flux 
limit of the SDSS spectroscopic sample, and iig.s can be 
computed directly using 



L19.5 — ^ 



(3) 



where Li is the luminosity of the i"^ member galaxy, Q 
is the completeness of the survey at the position of that 
galaxy, and the summation is over all group members 
with °-'^Mr - 51og/i < -19.5. For groups with z > 0.09, 
however, we need to correct for the missing members 
with °-^Mr,iiin - 51og/i < °-^Mr - 51og/i < -19.5, with 
"■"'^Mrjim — 51og/i the absolute magnitude limit at the 
redshift of the group. In this case, we define the charac- 
teristic luminosity as 

^19.5- ... ' . . T^, (4) 



/(-^^19.5, -^lim) 



with f{Lig,5, L\i^) a correction factor (0 < / < 1) that 
accounts for the galaxies missed because of the apparent 
magnitude limit of the spectroscopic survey. The method 
of computing /(-L19.5, Liim) is described in 13.31 below. 

Step 3: Estimate the mass, size and velocity dis- 
persion of the dark matter halo associated with 
each tentative groups. Using the value of L19.5 de- 
termined above and an assumption for the group mass- 
to-light ratio, Mft/Li9.5, we assign each tentative group 
with a halo mass which we use in the following steps to 
assign group memberships. 

In the first iteration we simply adopt a constant 
mass-to-light ratio, Mh/Lig,5 = 500h Mq/ L© for all 
groups. For all subsequent iterations, however, we use 
the Mft/Li9.5 - L19.5 relation obtained from the previous 
iteration (using the method described in !j3.5p . Because 
of this iterative technique the final group catalogue is 
very insensitive to the (fairly arbitrary) initial guess of 
Mh/Lig,5 = 500/1 M0/L0 (see Yang et al. 2005a for de- 
tailed tests). Note that the halo masses in this step are 
estimated using the mass-to-light ratio, and agree well 
with the final masses to be estimated in Section [3751 

Throughout this paper we define dark matter haloes 
as having an overdensity of 180. This implies, for the 
WMAP3 cosmology adopted here, a halo radius of 

1/3 



1.26/i"^Mpc 



1014 h-mp, 



(1 ^group) 



(5) 

where Zgi-oup is the redshift of the group center, and a 
line-of-sight velocity dispersion of 



a ^ 397.9 km s" 



10i4/i-iM(. 



0.3214 



(6) 



The latter is a fitting function that accurately captures 
the halo mass dependence of the one-dimensional velocity 
dispersion as given by equation (14) in van den Bosch et 
al. (2004), using the halo concentrations of Maccio et 
al. (2007). 

Step 4: Update group memberships using halo 
information. Once wc have a group center and a ten- 
tative estimate of the size, mass, and velocity dispersion 
of the halo associated with it, we can assign galaxies to 
this group using these halo properties. If we assume that 
the distribution of galaxies in phase-space follows that of 
the dark matter particles, the number density contrast 
of galaxies in the redshift space around the group cen- 
ter (assumed to coincide with the center of the halo) at 
redshift Zgroup can be written as 

PM(i?,Az) = ^^p(Az) 
c p 

where c is the speed of light, Az = z 



(7) 
p is the 

average density of Universe, and S(-R) is the projected 
surface density of a (spherical) NEW (Navarro, Erenk & 
White 1997) halo: 



^groupj 



I](i?) = 2 r, 5 p /(i?/r.) 
with r, the scale radius, 



7^ 



1 



if a; < 1 

ff a; = 1 
if a; > 1 



(8) 



(9) 
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TABLE 1 

Various masses and their, definitions 



Name 


Definition 


M. 


stellar mass of a galaxy 




total stellar mass of group members with "'^M^ — 51og/i < —19.5 


Mh 


true halo mass (unless stated otherwise) 


Ml 


halo mass estimated using the ranking of L19.5 


Ms 


halo mass estimated using the ranking of Afgtollar 





1 




0.8 



All 


0.6 


6- 






0.4 




0.2 









1 




0.8 






All 


0.6 




0.4 













1 




0.8 






«4-l 


0.6 


All 




0.4 




0.2 








12.5<Iog M^gl3.0 

13.0<log M^sia.S 

13.5<Iog M^S14.0 

14.0<log M^su.S 

14.5<log M^SIS.O 
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Fig. 2. — The upper, middle and lower panels show the cumulative distributions of completeness, /c (the fraction of true members), 
contamination, (the fraction of interlopers), and purity, /p, (ratio between the true members and the group members). See text for the 
detailed definitions of all three parameters. In the left- and right-hand panels these values are number weighted and luminosity weighted, 
respectively. Different lines show the result for groups in haloes of different masses, as indicated. Results are plotted for groups with at 
least 2 members, since groups with only 1 member have, by definition, /i = 0. 



and 



180 



3 ln(l + C180) - ci8o/(l + C180) 

with C180 = fi&o/^s- The function p(Az)dAz describes 
the redshift distribution of galaxies within the halo, and 
is assumed to have a Gaussian form, 

-(cAz)2 



(11) 



where a is the rest-frame velocity dispersion of equa- 
tion ([6]). Thus defined, Pm{R,^z) is the three- 
dimensional density contrast in redshift space. In order 
to decide whether a galaxy should be assigned to a par- 
ticular group we proceed as follows. For each galaxy we 
loop over all groups, and compute the distance {R, Az) 
between the galaxy and the group center, where R is 
the projected distance at the redshift of the group. If 
Pm{R,Az) > B, with B = 10 an appropriately cho- 
sen background level (see Yang et al. 2005a), the galaxy 
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Fig. 3. — The global completeness, /haloi defined as the fraction 
of haloes in the MGRS whose brightest member has actually been 
identified as the brightest (central) galaxy of its group, as function 
of the true halo mass . Results are shown for all haloes (dashed 
histogram) and for those haloes with at least three members in the 
MGRS (solid histogram). 

is assigned to the group. If a galaxy can be assigned to 
more than one group according to this criterion, it is only 
assigned to the one for which Pi\j{R, Az) is the largest. 
Finally, if all members in two groups can be assigned 
to one group according to the above criterion, the two 
groups are merged into a single group. 

Step 5: Iterate. Using the new memberships ob- 
tained in Step 4, we re-compute the group centers and 
go back to Step 2. This iterating process goes on un- 
til there is no further change in the group memberships. 
Next we use the resulting group catalogue to compute 
/(-L19.5, Liim) and the relation between M/j/Lig.s and 
L19.5 and we go back to Step 1. We stop this iteration 
cycle once the M/i/Lig.5 - L19.5 relation has converged, 
which typically takes only 3 to 4 iterations. 

3.2. Completeness, Contamination and Purity of the 
Group Catalogues 

To test its performance, we run our halo-based group 
finder over a detailed mock galaxy redshift survey 
(MGRS) that mimics the SDSS DR4. The MGRS is 
constructed by populating dark matter haloes in numer- 
ical simulations of cosmological volumes with galaxies of 
different luminosities, using the conditional luminosity 
function (CLF) model of van den Bosch et al. (2007, in 
preparation). This CLF describes the halo occupation 
statistics of SDSS galaxies, and accurately matches the 
SDSS luminosity function and the clustering properties 
of SDSS galaxies as function of their luminosity. We used 
a stack of simulations with different resolutions (100 and 
300 h~^Mpc cubes with 512'^ dark matter particles each) 
to make sure that the mock catalogue is complete down 
to the SDSS magnitude limit (see Yang et al. . 2004 for 
the stacking). Next a MGRS is constructed mimicking 
the sky coverage of the SDSS DR4 and taking detailed 
account of the angular variations in the magnitude lim- 
its and completeness of the data (see Li et al. 2007 for 
details). Methods like this are becoming widespread for 
both understanding cluster detection (Yang et al., 2005a; 
Gerke et al., 2005; Koester et al., 2007, Cohn et al., 2007) 



and in quantifying selection functions (Rozo et al., 2007). 

To assess the performance of the group finder we follow 
Yang et al. (2005a) and proceed as follows. For each 
group, k, we look up the halo ID, hk, of the brightest 
group member, and we define Nt as the total number 
of true members in the MGRS (with 0.01 < z < 0.20) 
that belong to halo hk, Ng as the number of these true 
members that are selected as members of group fc, Ni as 
the number of interlopers (group members that belong to 
a different halo), and A^g as the total number of selected 
group members. These allow us to define, for each group, 
the following three quantities: 

• The completeness fc = Ng/Nt 

• The contamination /i = Ni/Nt 

• The purity /p ee Nt/Ng 

Since Ng ^ Ni + Ns, we have that /p = l/(/c + /i). A 
purity /p < 1 implies that the number of interlopers is 
larger than the number of missed true members, while 
/p > 1 implies that the group is not complete {fc < 1) 
and the number of missed true members is larger than 
the number of interlopers. Note that the identity of the 
halo that belongs to a group is solely based on the halo 
ID of the brightest group member. Consequently, the 
contamination /i can be larger than unity. An ideal, 
perfect group finder yields groups with /c = /p = 1 and 
/i = 0. In the case of the halo-based group finder used 
here, the value for the background level B has been tuned 
to maximize the average value of /c(l — /i) (see Yang et 
al. 2005a). 

Results obtained from the MGRS are shown in Fig. [2l 
Since groups with a single member have zero contami- 
nation (/i — 0) by definition, results are only shown for 
groups with a richness N > 2. The upper left-hand panel 
of Fig. [2] shows the cumulative distributions of the com- 
pleteness fc- Different line-styles correspond to groups 
of different true halo masses, as indicated. The frac- 
tion of groups with 100 percent completeness (i.e., with 
/c = 1 depends on group mass, and ranges from 95% 
for low mass groups to 60% for the most massive clus- 
ters. Since our group finder is tuned to maximize the 
average value of /c(l — /i), massive groups with larger 
velocity dispersions have larger /i due to the contami- 
nation of foreground and background galaxies. A com- 
promise between fc and fi leads to smaller fc for more 
massive groups. Almost independent of group mass, we 
find that more than 90% of all groups have a complete- 
ness fc > 0.6, while an average of 80% of all groups have 
fc > 0.8. The middle left-hand panel of Fig. [2] shows the 
cumulative distributions of the contamination On 
average, around 65% of the groups have zero contamina- 
tion (/i = 0), while ^ 85% of the groups have fi < 0.5, 
again virtually independent of group mass. These inter- 
lopers (contamination) are either nearby field galaxies or 
the member galaxies of nearby massive groups, especially 
those along the line of sight. Finally, the lower left-hand 
panel shows the cumulative distributions of the purity /p, 
indicating that there are on average as many groups with 
/p < 1 as with /p > 1. The break at fp 1 means that the 
number of recovered group members is about the same 
as the number of true members. Thus, the sharper the 
break is, the better. An ideal situation is a step function 
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Fig. 4. — Upper panels: The fraction of the characteristic luminosity L19.5 that is contributed by galaxies above a given magnitude hmit 
as a function of L19.5. Left-hand, middle and right-hand panels correspond to magnitude limits of = "-^Mr — 5 log ft = —20.0, —20.5 
and —21.0, respectively. Each dot corresponds to a group in our SDSS group catalogue based on Sample II with z < 0.09. The open circles 
indicate the mean fractions for a given bin in Lig.5, while the solid line is the exponential function that best fits these mean values, and 
which defines our completeness correction factor /(L19.5, imin) discussed in the text. Lower panels: same as the upper panels, except that 
here we plot the fraction of characteristic stellar mass, Mstciian contributed by galaxies above a given magnitude limit. 



at /p = 1. In addition, only a negligibly small fraction 
of groups have /p < 0.5, while only for the most mas- 
sive haloes is there a significant fraction 10%) with 
/p>1.5. 

We also determine the completeness, contamination 
and purity in terms of the total luminosity rather than 
the number of member galaxies. The corresponding re- 
sults are shown in the right-hand panels of Fig. [21 re- 
spectively. As one can see, the results are very similar to 
those in terms of the number of members. 

As a final, quantitative assessment of the performance 
of our halo-based group finder, we examine the global 
completeness, /halo, defined as the fraction of haloes in 
the MGRS whose brightest member has actually been 
identified as the brightest (central) galaxy of its group. 
Fig.[3]shows /halo obtained from our CLF mock for haloes 
with Nt > 1 (dashed lines) and Nt > 3 (solid lines) as 
functions of the true halo mass. As one can see, the group 
finder successfully selects more than 90% of all the true 
haloes with masses > 10^^/i~^Mq almost independent of 
their richness and with only a very weak dependence on 
halo mass. Note that this does not imply that ~ 10% of 
the central galaxies in dark matter haloes have not been 
selected by the group finder. Especially for the more 
massive haloes, the vast majority of these central galaxies 
have been selected as a group member, but they are not 
the brightest group member. This can happen whenever 
two nearby haloes are merged into a single group by the 
group finder. 



3.3. Completeness Corrections for the Characteristic 
Luminosity and Stellar Mass 

An important parameter for each group is its charac- 
teristic luminosity, defined by equation (j4|). Since the 
correction factor /(L19.5, imin) depends on the charac- 
teristic luminosity L19.5 itself, it can only be determined 
in an iterative way. In the first iteration of our group 
finder, we use 

/~_^^ Lcj){L)dL 

where L^ut is the luminosity that corresponds to '^■^M^ — 
5 log ft, — —19.5, and (j){L) is the galaxy luminosity func- 
tion, here assumed to be that obtained by Blanton et 
al. (2003b). However, as discussed in Yang et al. (2005a), 
it is not reliable to make the correction based on the as- 
sumption that the galaxy luminosity function in groups 
of a given mass is the same as that of the total galaxy 
population. After all, the conditional luminosity func- 
tion of galaxies in groups varies significantly with group 
mass (Yang et al. 2005c; Zheng et al. 2005). Therefore, 
in the following iterations we use the group catalogue 
of the previous iteration to self-calibrate /(^ig.s, iiim)- 
To do so, we first select all groups with z < 0.09 (for 
which / = 1) and compute their characteristic luminosi- 
ties. Next, we use these groups to determine the frac- 
tion of the characteristic luminosity that is contributed 
by group members with L > Lhm for different val- 
ues of Liim- The upper panels of Fig. [4] show the re- 
sults for three different values of Lhm (corresponding 
to °-'^Mr,um - 5 log /i = -20.0,-20.5 and -21.0, from 
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left to right). Next we determine the mean values of 
these fractions as function of iig.s, which are shown in 
Fig. 2] as open circles, and we define /(^ig.s, ium) as 
the exponential function that best fits these mean values 
(shown as solid lines in Fig. |4|). Note, however, that the 
scatter around these mean values is fairly large. Conse- 
quently, despite the fact that our correction factors are 
self-calibrated in an iterative way, they are only accurate 
in a statistical sense, and are not expected to be accurate 
for individual groups. As we will show in Section 13.51 a 
considerable amount of scatter in the halo masses can be 
introduced by such correction. 

Similar to the characteristic luminosity defined above, 
we also define a characteristic stellar mass 

M.tena.= \ , E^' (13) 

where as for the characteristic luminosity the summa- 
tion is over all group members with "'^M^ — 5 log ft, < 
— 19.5, and ^(Lig.s, Liim) is a similar correction factor as 
f{Lig.5, iiim) but tailored to the stellar mass rather than 
the r-band luminosity. Similar to / these correction fac- 
tors can be self-calibrated and the results are shown in 
the lower-panels of Fig. [H 

3.4. Correcting for Survey Edges 

An additional incompleteness effect that needs to be 
accounted for is due to the survey geometry. A group 
whose projected area straddles one or more survey edges 
may have members that fall outside of the survey, thus 
causing an incompleteness, which in turn affects our mass 
estimate of the group. The geometry of the survey used 
here is defined as the region on the sky where the SDSS 
redshift completeness C > 0.7, and is indicated by the 
red areas in Fig. [T] Clearly, this geometry is fairly com- 
plicated which can potentially have a significant impact 
on various statistics of the group catalogue (cf. Cooper 
et al. 2005; Berlind et al. 2006). In order to correct for 
these edge effects we proceed as follows: 

First, we estimate the mass for each group using the 
method described in Section 13.51 below without taking 
edge effects into account. We then randomly distribute 
200 points within the corresponding halo radius rigo 
(which we compute using Eq. [5]). Next we apply the 
SDSS DR4 survey mask and remove those random points 
that fall outside of the region where C > 0.7. For each 
group we then compute the number of remaining points, 
A^iomain, and we define /edge = A'^,.ci„ain/200 as a measure 
for the volume of the group that lies within the survey 
edges. Finally we multiply L19.5 and Mstciiar with l//odgo 
to correct for the 'missing members' outside of the edges 
of the survey. Tests with MGRSs show that this correc- 
tion works well, except for groups with a small /edge- We 
therefore discard those groups with /edge < 0.6, which 
removes (only) 1.6% of all groups. After this correction 
for edge effect, we re-calculate the mass for each group 
as described in Section 13.51 The mass difference before 
and after this edge effect correction is relatively small: in 
most cases less than 3% and on average less than 10%. 
Since this change in group mass translates only in very 
small changes in rigo no iteration of this procedure is 
required. 

3.5. Estimating Group Masses 



An important aspect of each galaxy group catalogue 
is the determination of the masses of the groups. Most 
studies infer the (dynamical) group mass from the ve- 
locity dispersion of their member galaxies. However, the 
vast majority of the groups in our sample contain only a 
few members making a dynamical mass estimate based 
on its members extremely unreliable. Mass estimates 
based on gravitational lensing (either strong or weak) or 
on X-ray emission, also can only be applied to the most 
massive systems. Furthermore, these latter two methods 
require high-quality data in excess to the information di- 
rectly available from the redshift survey used to construct 
the group catalogue, rendering them impractical. 

Rather, we estimate the group masses from their char- 
acteristic luminosities or characteristic stellar masses. 
This has the advantage that (i) it is equally applicable to 
groups spanning the entire range in richness, and (ii) it 
does not require any additional data. As demonstrated 
in Yang et al. (2005a), the mass of a dark matter halo 
associated with a group is tightly correlated with the 
total luminosity of all member galaxies down to some 
luminosity. This is further illustrated in Fig. [5l where 
we plot the correlations between the halo mass, Mh, and 
the characteristic stellar mass Mgteiiar (left-hand panel) 
and characteristic luminosity, L19.5 (right-hand panel) in 
the semi-analytical model of Kang et al. (2005). Clearly, 
both Mstoiiar and -Lig.5 are tightly correlated with halo 
mass, with the Mstoiiar— Afh relation being slightly tighter 
than that between Lig.5 and Mh- This is expected, since 
-Mstciiar is less affcctcd by the current amount of star for- 
mation, and suggests that the characteristic stellar mass 
is a somewhat better mass indicator than the character- 
istic luminosity. On the other hand, the luminosities are 
directly observed, while the stellar masses are derived 
quantities, which creates additional scatter. Therefore, 
we will compute two mass estimates for each group; M5, 
based on the characteristic stellar mass AfstcUar, and M^, 
based on the characteristic luminosity Lig.5. Through- 
out, we will compare all results from the group catalogue 
for both mass estimates. 

In order to convert the characteristic luminosities and 
stellar masses to halo masses, we make the assumption 
that there is a one-to-one relation between L19.5 (or 
Mstciiar) and M/i. For a given (comoving) volume and 
a given halo mass function, n(Mh), one can then link the 
characteristic luminosity or stellar mass to a halo mass 
by matching their rank orders. Note, however, that this 
only works for a group sample that is complete in Lig.5 
or MstcUar- In Fig.[6l we plot the redshift distributions of 
groups in three different ranges of iig.5 (upper panels) 
and Mstciiar (lower panels). Comparing these distribu- 
tions with that expected for a constant number density 
(shown as the solid line), we obtain the rough redshifts 
out to which these different samples are complete. In 
Table [21 we list the redshift limits thus obtained for 
the three different bins of mass indicators, along with 
the numbers of groups in each of the complete samples. 
Only groups in these complete samples are used in the 
ranking; the masses of other groups are estimated by lin- 
ear interpolation of the relations between Mh and each of 
the mass indicators obtained from the complete samples. 
Because of the particular volume limited samples used, 
we can assign group masses down to 10^^-^ H'^Mq. 
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log[M,/(h->Mj] log[M,/(h-'Mj] 

Fig. 5. — The distributions of characteristic stellar mass Afstcllar (total stellar mass of galaxies with ^'^Mr — 51og/i < —19.5 in a halo, 
left-hand panel) and luminosity Lig.s (total luminosity of galaxies with ''■^Mr — 51og/i < —19.5 in a halo, right-hand panel) as function 
of halo mass, M^. The results are obtained from the semi-analytical model of Kang et al. (2005). Obviously both Afgtciiar ^md Z/19.5 are 
tightly correlated with the halo mass. 
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Fig. 6. — The redshift distributions of groups for three bins in characteristic luminosity -L19.5 (upper panels) and characteristic stellar 
masses Mst^nar (lower panels), as indicated. Solid lines indicate the expected values for a constant group number density. Whenever the 
observed distribution of groups starts to systematically drop below this line, we consider the sample incomplete. The vertical lines indicate 
the redshifts out to which we consider the samples complete. In the right-hand panels, the group samples are considered complete out to 
the redshift limit of our galaxy sample (2 = 0.2). 



Clearly, the assumption of a one-to-one relation be- 
tween the characteristic luminosity or stellar mass and 
the halo mass is oversimplified. In reality, these relations 
contain some scatter, which results in errors in our in- 
ferred group masses. However, detailed tests with mock 
galaxy redshift surveys have shown that this method 
nevertheless allows for a very accurate recovery of av- 
erage halo occupation statistics. In particular, the group 
finder yields average halo occupation numbers and aver- 



age mass-to-light ratios that are in excellent agreement 
with the input values (Yang et al. 2005b; Weinmann et 
al. 2006a). An additional shortcoming of our method is 
it requires the halo mass function, which is cosmology 
dependent (e.g. Sheth, Mo & Tormen 2001; Warren et 
al. 2006) However, as we will show in Section HTTl it 

* Throughout we compute the halo mass function using the for- 
mulae given in Warren et al. (2006) with the transfer function 
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Fig. 7. — Panel (a): comparison between the assigned halo mass Ml, based on the characteristic group luminosity -L19.5, and the true 
halo mass Mf^. These results are obtained from the mock group catalogue constructed from our MGRS. The small panel plots the standard 
deviation in Q defined by equation jMj, and reflects the amount of scatter with respect to the line of equality Mj^ = Mi^ (shown as a 
solid line in the scatter plot). Panel (h): Same as panel (a) except that here we show the comparison between Mh and the assigned halo 
mass AIi^ estimated from the ranking of the characteristic group luminosity Lig.s obtained directly from the true group members in the 
simulation box used to construct the MGRS. Panel (c): same as panel (a) but this time only plotting the results for groups with z < 0.09 
for which one does not need to correct iig.s for missing members. Panel (d): same as panel (a) but where we have mimicked a perfect 
group finder without interlopers (/; = 0) and with a completeness /c = 1. See text for a detailed discussion. 
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TABLE 2 

Complete Samples used for Mass Ranking 



Redshift 


logLig.5 


Groups 


log Mstellar 


Groups 






Samples I/II/III 




Samples I/II/III 


(1) 


(2) 


(3) 


(4) 


(5) 


0.01 < 2 < 0.20 


> 10.9 


7583/7683/8409 


> 11.3 


11740/11851/12012 


0.01 < z< 0.15 


[10.2 10.9] 


75120/75306/66001 


[10.7 11.3] 


53248/53377/47953 


0.01 < z< 0.08 


[9.6 10.2] 


33898/33939/36038 


[10.0 10.7] 


32739/32789/32702 



Note. — Properties of the three complete samples used to estimate group mass via the ranking of characteristic luminosity or characteristic 
stellar mass. Column (1) lists the redshift range of each sample. Columns (2) and (4) lists the corresponding ranges in logLig.g and logMg^eiiarj 
respectively. Finally, columns (3) and (5) lists the corresponding numbers of groups in the catalogues based on galaxy samples I, II and III, 
respectively. 
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is extremely easy to convert the group masses to another 
cosmology, without having to rerun the group finder. 

In order to further assess the reliability of the halo 
masses assigned to individual groups, we use the mock 
group catalogue obtained from the CLF-based mock. 
Following the procedure described above we assign each 
(mock) group a halo mass Ml based on its ranking of the 
characteristic group luminosity ^19.5. The top- left panel 
of Fig [7] shows the Mj^ thus obtained versus the true 
halo mass, Mh, defined as the mass of the dark matter 
halo that hosts the brightest group galaxy. In order to 
quantify the scatter with respect to the line of equality 
{Ml = Mh), we determine for each group the quantity 

Q EE i= [log(ML) - log(M,)] (14) 

and measure the standard deviation, ctq, in several bins 
of [log(ML) +log(M,j)] /2. The results, shown in the 
small panel, indicate that the scatter is ^ 0.35 dex for 
groups with lO^^ /j-IMq < Ml < lO"'^ h-^Mg, drop- 
ping to ~ 0.2 dex at the high and low mass ends. 

There are several factors that contribute to this scat- 
ter. The first is the intrinsic scatter in the relation be- 
tween the halo mass and the true value of iig.5. The 
upper right-hand panel of Fig. [7] shows the relation be- 
tween the true halo mass and the assigned mass based 
on the ranking of the true ^19.5 obtained from the CLF 
mock before incorporating any observational effects (e.g. 
magnitude limit, incompleteness and survey boundary). 
In other words, we measure £19.5 using all mock galaxies 
with "-^Mr — 51og/i < —19.5, independent of whether 
those galaxies would be incorporated in the mock survey 
or not. The resulting scatter is about 0.2 dex, similar 
to that of the semi-analytical model shown in the right- 
hand panel of Fig. [S] 

The second source of scatter owes to the incomplete- 
ness and contamination introduced by our group finder. 
The lower left-hand panel of Fig. [7] shows the relation 
between the assigned mass and the true mass for groups 
with z < 0.09. As discussed in Section [3?3l the character- 
istic luminosity of these groups does not need to be cor- 
rected for incompleteness due to the magnitude limit of 
the survey (i.e., all galaxies with ^-^M^ — 5 \ogh < —19.5 
make the magnitude limit of the survey). The scatter 
here is only marginally larger than the intrinsic scatter 
shown in the top-right panel, suggesting that the group 
finding algorithm by itself only introduces a very small 
amount of uncertainty in the assigned masses. 

The final source of scatter in the assigned group masses 
owes to the fact that for groups with z > 0.09 we need to 
correct the characteristic luminosity for the group mem- 
bers that do not meet the magnitude limit of the sur- 
vey. As shown in Fig.|3]this can introduce a considerable 
amount of scatter. To assess its impact on the inferred 
halo masses we proceed as follows. We group all galaxies 
in the mock SDSS DR4 according to the halo to which 
they belong. This resulting 'group catalogue' has, by 
construction, a completeness /c = 1, an interloper frac- 
tion /i = 0, and a purity fp — 1. For each group in this 
perfect catalogue we estimate the characteristic luminos- 
ity ^19.5: for groups with z < 0.09 we simply sum the 
luminosities of all galaxies with °-^Mr — 5\ogh < —19.5, 
while for groups with z > 0.09 we use the correction fac- 
tors /(L19.5, iiim) as described in Section [3731 Finally, 



we assign each group a mass Ml based on the ranking 
of £19.5 as described above. The lower right-hand panel 
of Fig [7] plots the resulting Ml as function of the true 
halo mass Mh. The scatter is ~ 0.25 dex, comparable to 
that in the lower left-hand panel. 

We therefore conclude that the majority of the scat- 
ter in the relation between the true and assigned halo 
masses owes to the intrinsic scatter in the relation be- 
tween halo mass and its characteristic luminosity. The 
fact that the group finder is not perfect (i.e., suffers from 
interlopers and incompleteness) and that we need to cor- 
rect the characteristic luminosity for members that do 
not make the magnitude limit of the survey, only adds a 
relatively small contribution to the total scatter. 

4. BASIC PROPERTIES OF THE GROUP 
CATALOGUE 

Application of our halo-based group finder to the SDSS 
DR4 data set described in Section [2] results in 295992, 
301237 and 300049 groups for samples I, II and III, re- 
spectively. In what follows we present a few global prop- 
erties of these group catalogues. 

Table 2 lists, for each of the three samples, the num- 
ber of groups with 1, 2, 3, and more than 3 members: 
clearly, the majority of the groups contain only a single 
member. Note also that sample III yields many more 
systems with richness N > 2 than samples I and II; this 
is simply due to the fact that almost all 38672 galaxies 
with an assigned redshift are members of such systems. 
As shown in Zehavi (2002), about 40% of these assigned 
redshifts have an error of more than 500 km s^^. This 
means that in most cases these galaxies should not have 
been assigned to the group in question (i.e., they are 
interlopers), which obviously causes a systematic bias 
towards too many members per group. On the other 
hand, not taking account of the galaxies lost because of 
fiber collisions results in an opposite bias towards too few 
members per group. We can assess the impact of these 
biases on the group catalogue by comparing results ob- 
tained from Samples II and III. We will come back to 
this issue later in this section. 

As an illustration. Fig. [5] shows the distributions of 
galaxies and groups in a 3° slice. As expected, massive 
groups are located in the denser regions of the galaxy 
density field, while groups with lower masses are more 
diffusely distributed. The clustering properties of these 
groups directly reflect the clustering properties of dark 
matter haloes, and can thus be used to directly probe the 
mass dependence of the halo bias (cf. Yang et al. 2005b; 
Coil et al. 2006; Berhnd et al. 2007). We defer a more de- 
tailed analysis of the clustering properties of the groups 
in the SDSS DR4 group catalogues presented here to a 
forthcoming paper. 

Fig. [9] plots the number of groups as a function of 
group richness (left panel), redshift (middle panel), and 
halo mass (right panel). Group redshift is estimated us- 
ing the luminosity- weighted average of all member galax- 
ies. Dashed, solid and dotted histograms correspond to 
the group catalogues based on Samples I, II and III, re- 
spectively. As already mentioned above, groups obtained 
from Sample III are systematically richer than those ob- 
tained from the other two samples, which simply owes 
to the fact that all galaxies with an assigned redshift are 
group members. However, as is evident from the middle 



Galaxy Groups in the SDSS DR4 

Galaxies (N = 23594) 



13 




Fig. 8. — The large wedge shows the distribution of a subset of SDSS DR4 galaxies in a 3° slice in the south galactic pole region of the 
SDSS. These distributions are repeated in the smaller wedges, where we overplot, as (red) open circles, the groups with assigned masses in 
the range lO-*^"^ h~^MQ to 10-'^'* h~^'MQ (lower-left wedge) and > 10-'^'' h~^'MQ (lower-right wedge). Note the halo masses used in this plot 
are obtained from the ranking of the characteristic luminosity L19.5. 



TABLE 3 

Number of Galaxies and Groups in the SDSS DR4 



Samples 


Galaxies 


Groups 


N = 1 


N = 2 


N = 3 


Af > 3 


(1) 


(2) 


(3) 


(4) 


(5) 


(6) 


(7) 


Sample I 


362356 


295992 


266763 


19522 


4511 


5196 


Sample II 


369447 


301237 


271420 


19868 


4619 


5330 


Sample III 


408119 


300049 


250492 


33537 


7848 


8172 



Note. — For each of the three samples, columns (2) and (3) list the number of galaxies and of groups, respectively. In addition, columns (4)-(7) 
list the numbers of groups with 1, 2, 3, and more than 3 members. 
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Fig. 9. — The number of groups as function of the number of group members (left-hand panel), group redshift (middle panel) and assigned 
halo mass (right-hand panel). The dashed lines, histograms and dotted lines show the results for the group catalogues based on Samples 
I, II and III, respectively. For comparison, in the right-hand panel we also plot the theoretical halo mass function (long-dashed line). For 
log(M£/ /i~^Mq) > 13 the mass function of the groups is in excellent agreement with this theoretical mass function, indicating that our 
group sample is complete for this mass range. For lower mass groups, the sample is only complete out to lower redshifts (cf. Table 1 and 
Fig-EJ. Note that in the right-hand panel the group masses have been assigned based on their characteristic luminosity L19.5. Using the 
characteristic stellar mass, Mgtciiar instead results in an almost identical plot. 
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and right-hand panels, the redshift distributions and halo 
mass functions of all three group samples are extremely 
similar: although the inclusion of galaxies with assigned 
redshifts changes the richness of the systems, their red- 
shifts and inferred masses are virtually unaffected. The 
long-dashed line in the right-hand panel shows the the- 
oretical mass function of dark matter haloes over the 
redshift range 0.01 < z < 0.20. The mismatch between 
the group mass function and this theoretical halo mass 
function at log[AfL/ /i-^Mq] < 10^^-^ is caused by the 
incompleteness of the group catalogue shown in Fig [6] 
and discussed in Section 13.51 If we would only plot the 
mass functions for groups in the complete samples of Ta- 
ble 1, they would, by construction, perfectly match their 
theoretical equivalent. 

As discussed in Section [3. 51 group masses are obtained 
down to 10^^'^ H^^Mq using two different mass indica- 
tors; the characteristic luminosity iig.s and the char- 
acteristic stellar mass Mstciiar- The left-hand panel of 
Fig-Hnicompares the inferred group masses and Ms, 
obtained using iig.s and Afstciian respectively. Overall, 
both halo masses agree very well with each other, with an 
average scatter that decreases from '--^ 0.1 dex at the low 
mass end to ~ 0.05 dex at the massive end. This scatter 
is expected, and mainly reflects that galaxies of a given 
luminosity have different colors, and therefore different 
(inferred) stellar masses. The effect is somewhat larger 
for lower mass groups simply because their characteristic 
mass and luminosity are dominated by a smaller number 
of galaxies. 

Finally, to assess how the uncertainties in the correc- 
tion for fiber collisions affect the group catalogue, we 
compare the masses of groups in Sample II with those of 
its counterparts in Sample III. Here a group in Sample 
III is defined as the counterpart of a group in Sample 
II if it has the same brightest (central) galaxy. We can 
find a Sample III counterpart for ^ 95% of all groups 
in Sample II. There are two main reasons why a group 
may not have a counterpart. First of all, in about 3% 
of the groups in Sample III, the brightest group member 
is actually a galaxy with an assigned redshift, so that 
this group can not have a counterpart in Sample II. In 
addition, about 1% of the groups in sample II merge 
with other (nearby) groups when the additional galax- 
ies with assigned redshifts are used. Consequently, in 
terms of their brightest galaxies, some groups in Sample 
II have disappeared in Sample III (i.e., do not have a 
counterpart), while others have suddenly increased their 
mass substantially because they have now merged with 
another group. The right-hand panel of Fig. \W\ plots 
the relation between the assigned halo mass of a group 
in Sample II and that of its counterpart in Sample III. 
The relation is extremely tight: for more than 90% of 
the systems the difference in the assigned group mass is 
smaller than 50%, at any given mass. This emphasizes 
once again that although the incompleteness due to fiber 
collisions can have a significant effect on the richness of 
individual groups, it does not have a significant impact 
on which systems are selected as groups or on their as- 
signed masses. Although the standard deviation in Q 
(defined by equation [M] but with Ml and Mh replaced 
by Ml.ii and Ml.iii, respectively) reaches up to ^ 0.25 
dex, this largely owes to a very small fraction of outliers 
that are clearly visible in the scatter plot. In particu- 



lar, one can distinguish a second 'sequence' of systems 
with Ml, III > Mlji'. this corresponds to the 1% of the 
groups in Sample II mentioned above that have merged 
due to the additional galaxies with assigned redshifts. 

4.1. Average Mass-to-Light Ratios 

The mass-to-light ratio of a dark matter halo ex- 
presses the efficiency with which stars have formed in 
that halo. Consequently, accurate measurements of the 
average mass-to-light ratios of dark matter haloes as 
a function of halo mass can put tight constraints on 
the physics of galaxy formation. The upper left-hand 
panel of Fig. [TT] shows the average mass-to-light ratios, 
as a function of halo mass obtained from 
our group catalogue. Results are shown based on both 
mass indicators described above: Lig.5 (solid line), and 
-^stellar (open circles). The error-bars indicate the 68% 
percentiles from the distributions in each Ms mass bin. 
Since Ml is based on the ranking of iig.5, there is no 
scatter in the corresponding mass-to-light ratios. As one 
can see, the mass-to-light ratios obtained from the two 
different mass indicators are in extremely good agree- 
ment with each other. Note that the results shown here 
are obtained from Sample II: those for Samples I and III 
are again very similar, and consequently not shown for 
the sake of argument. For reference, all mass-to-light ra- 
tios obtained from Sample II are listed in Table 3. The 
mass-to-light ratios have also been obtained by various 
observations, e.g. Carlberg et al. (1996) from CNOC 
sample, Popesso et al. (2004) from RASS-SDSS. We de- 
fer a more detailed comparison to previous results in a 
forthcoming paper. 

In addition to the mass-to-light ratios, M/^/Lig s, we 
can also use our group catalogues to compute the ra- 
tio between halo mass and characteristic stellar mass, 
Af/i/Afstciiar- The lower left-hand panel of Fig. [TT] shows 
the average halo mass to stellar mass ratios, A^/i/Af stellar, 
as a function of halo mass. Once again we show the re- 
sults obtained from both mass indicators. This time the 
open circles with errorbars reflect the results obtained 
using Lig.5 as mass indicator, while the results based on 
the characteristic stellar mass are shown as a solid line. 
Since A'ls is based on the ranking of Afgteiiar, this time 
there is no scatter in the results based on the charac- 
teristic stellar mass, while the errorbars reflect the 68% 
percentiles of the distributions in Af^/Afstciiar where Mh 
is obtained from Lig.5. As for the mass-to-light ratios, 
the results based on both mass indicators are extremely 
similar and are listed in Table 3. 

Since our mass assignments require use of the halo 
mass function, which is cosmology dependent, it is im- 
portant to investigate how the average Mfi/Lig,^ and 
Af/i/Afstciiar change if we change cosmology. The dashed 
lines in the right-hand panels of Fig. [TT] show the results 
obtained when adopting a WMAPl cosmology (fim = 
0.3, Qa = 0.7, h = 0.7, fJs = 0.9) instead of a WMAP3 
cosmology. Changing the cosmology changes (i) the lu- 
minosity and angular distances of all galaxies in the SDSS 
DR4, and thus their absolute magnitudes and (comov- 
ing) separations, and (ii) the halo mass function. The 
former has an almost negligible (small at massive end) 
effect, mainly because our sample of galaxies is restricted 
to z < 0.2. The halo mass function, however, has a 
strong impact: Since there are more massive haloes in a 
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log[My(h-'Mj] log[M,„/(h-iMj] 

Fig. 10. — Left-hand panel: Comparison of the group masses Mj^ and Afg, obtained using the two different mass indicators, L19.5 and 
Afstollari res pect ively. As expected, both masses agree extremely well: the standard deviation in Q, shown in the small panel and defined 
by equation I I14I I is less than 0.05 dex at the massive end. At the low mass ctq ~ 0.1 dex, due to the smaller average number of galaxies per 
group. Right-hand panel: The halo mass assigned to a group in Sample II versus the halo mass of the corresponding group in Sample III. 
Here a group in Sample III is defined as the counterpart of a group in Sample II if, and only if, it has the same brightest galaxy. Roughly 
95% of all groups in Sample II with Ml ^ 10^^ /i~^Mq have a counterpart in Sample III, and for more than 90% of these systems the 
difference in the assigned group mass is smaller than 50%. There is a small number of outliers, but they do not have a significant effect on 
the overall statistical properties of the group samples. 



TABLE 4 

Ratios between halo mass and luminosity and between halo mass and stellar mass 



log[Mh//l-^M0] log[AfL/Ll9.5] log[Mg/Ll9.5] log[Afs/Afstollar] log [Afz,/Af stellar] 







16% 


50% 


84% 




16% 


50% 


84% 


(1) 


(2) 


(3) 


(4) 


(5) 


(6) 


(7) 


(8) 


(9) 


11.80 


1.860 


1.659 


1.793 


1.957 


1.518 


1.335 


1.491 


1.741 


12.00 


1.921 


1.736 


1.900 


2.025 


1.532 


1.411 


1.541 


1.763 


12.20 


2.003 


1.777 


1.988 


2.099 


1.587 


1.479 


1.600 


1.807 


12.40 


2.089 


1.876 


2.100 


2.192 


1.660 


1.554 


1.660 


1.858 


12.60 


2.183 


2.043 


2.213 


2.291 


1.738 


1.633 


1.731 


1.921 


12.80 


2.281 


2.215 


2.324 


2.394 


1.822 


1.723 


1.813 


1.992 


13.00 


2.379 


2.332 


2.421 


2.491 


1.906 


1.805 


1.895 


2.058 


13.20 


2.470 


2.421 


2.504 


2.570 


1.983 


1.887 


1.973 


2.117 


13.40 


2.551 


2.500 


2.573 


2.641 


2.052 


1.964 


2.049 


2.165 


13.60 


2.616 


2.562 


2.629 


2.694 


2.106 


2.052 


2.115 


2.205 


13.80 


2.662 


2.615 


2.672 


2.737 


2.151 


2.100 


2.160 


2.231 


14.00 


2.693 


2.646 


2.699 


2.755 


2.180 


2.134 


2.187 


2.247 


14.20 


2.714 


2.675 


2.718 


2.775 


2.203 


2.170 


2.211 


2.259 


14.40 


2.718 


2.681 


2.723 


2.767 


2.216 


2.178 


2.220 


2.261 


14.60 


2.680 


2.645 


2.681 


2.718 


2.181 


2.145 


2.179 


2.216 


14.80 


2.657 


2.621 


2.667 


2.712 


2.152 


2.086 


2.151 


2.189 


Note. — Column (1) 


logarithm of the assij 


incd halo 


mass. Column 


(2): avcra^ 


re of the logarithm of the ratio between the assigned 


halo mass 



A'Il and the characteristic luminosity L19.5, Columns (3)-(5): 16, 50 and 84 percentiles of the distributions of the logarithm of the ratio between the 
assigned halo mass Ms and the characteristic luminosity. Column (6): logarithm of the ratio between assigned halo mass A'ls and the characteristic 
stellar mass Mstciiar- Columns (7)-(9): 16, 50 and 84 percentiles of the distributions of the logarithm of the ratio between the assigned halo mass 
Mji, and the characteristic stellar mass. The mass-to-light ratios Ml / Lig,^ and Ms / Lig,Q are in unit of HMq/ Lq, and the ratios between halo 
mass and the characteristic stellar mass Ml /Mstcila,! and Ms/Mgtciiar are in unit of h. All these results correspond to a WMAP3 cosmology, and 
we emphasize once more that all luminosities are in the SDSS r-band and have been K -\- E corrected to 2: — 0.1 
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Fig. 11. — Upper left-hand panel: the inferred mass-to-light ratios, M^/L-ig.s, of galaxy groups in the SDSS DR4 as function of their 
assigned halo mass, (see also Table 3). The solid line and open circles correspond to the mass-to-light ratios obtained using the group 
masses (based on the characteristic luminosity) and Ms (based on the characteristic stellar mass), respectively. By construction, there 
is no scatter in the inferred mass-to-light ratios when using Mi^, since the characteristic luminosity is assumed to be related to the halo 
mass on a one-to-one basis. Lower left-hand panel: Same as the upper left-hand panel, except that this time we plot the inferred ratios 
between halo mass and characteristic stellar mass, Mjj/Mstollar- This time there is no scatter when using Ms to infer the halo mass (solid 
line), and the errorbars reflect the 68% confidence levels of M^/Mstoiiar obtained using Mj^ as mass indicator. Right-hand panels: Solid 
lines are the same in the corresponding left-hand panels, and are obtained assuming a WMAP3 cosmology throughout. The dashed lines 
show the results obtained when using a WMAPl cosmology instead. Finally, the dotted lines correspond to the results obtained when 
using the group finder assuming a WMAP3 cosmology, but using a WMAPl halo mass function when inferring the final group masses. See 
text for a detailed discussion. 
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WMAPl cosmology than in a WMAP3 cosmology, the 
ranking assigns a larger halo mass to a given group, which 
results in larger values of Mh/Lig,^ and Mh/Msteiiar- 

Note that the mass-to-light ratios are also used in the 
group finder to assign memberships to groups (see Sec- 
tion 13. ip . This suggests that whenever one decides to 
change one or more cosmological parameters, one has to 
rerun the entire group finder in order to obtain new mass 
estimates (as we did for the WMAPl and WMAP3 cos- 
mologies shown in Fig. Ilip . This is impractical if one 
intends to use the group catalogue to constrain differ- 
ent cosmological models. In order to test whether we 
can avoid having to rerun the group finder when chang- 
ing cosmology we proceed as follows. We run our group 
finder over the SDSS DR4 assuming a WMAP3 cosmol- 
ogy, but then, when we convert iig 5 or A/gtoiiar into halo 
mass we use the WMAPl halo mass function. The re- 
sults are shown in the right hand panels of Fig. [Tl] as 
dotted lines. Clearly, the impact of assuming a different 
cosmology in the group finder is almost negligible. This 
demonstrates that one can simply convert the M^/Lig.s 
and M/j/Msteiiar listed in Table 3 to another cosmology 
(as long as it is not too different from the WMAP3 cos- 
mology adopted here) , without having to rerun the group 
finder over the data, by using the relation 



n(M;)dAf; = / n{M'^)AM'^ 



(15) 



Here Mh and n{Mh) are the mass and halo mass function 

in the WMAP3 cosmology, and Mh and n{Mh) are the 
corresponding values in the other cosmology. 

4.2. Group Velocity Dispersions 

For relatively massive groups, especially for groups 
with a sufficient number of member galaxies, one can 
estimate a dynamical group mass based on the velocity 
dispersion of the member galaxies. Following Yang et 
al. (2005a), we use the gapper estimator described by 
Beers, Flynn & Gebhardt (1990) to estimate the line-of- 
sight velocity dispersion of each individual group. The 
method involves ordering the set of recession velocities 
{vi\ of the N group members and defining gaps as 

gi=Vi+i-Vi, I = 1,2, ...,iV- 1 . (16) 

The rest-frame velocity dispersion is then estimated by 



N-l 



^gap 



(1 + %roup)^(iV - 1) 



Wigi 



(17) 



where the weight is defined as Wi ~ i{N — i). Since 
there is a central galaxy in each group, which is assumed 
to be at rest with respect to the dark matter halo, the 
estimated velocity dispersion has to be corrected. This 
results in a final velocity dispersion given by 



N 



N ~1 



J gap ■ 



(18) 



The upper panels of Fig. [12] show the line-of-sight ve- 
locity dispersions of groups thus obtained as a function 
of the assigned halo mass Ml for groups with at least 
3 (upper left-hand panel) and with at least 8 members 
(upper right-hand panel). Solid triangles with errorbars 
indicate the mean and the l-cr scatter of the line-of-sight 



velocity dispersion in each mass bin. Clearly, there is 
a good correlation between the velocity dispersion and 
the mass Ml, indicating that the assigned masses are 
reliable mass indicators. Compared to the theoretical 
prediction of equation ([S]), which is shown as a solid 
line, the line-of-sight velocity dispersions of the group 
members are on average 40% lower. As discussed in 
Yang et al. (2005a), this discrepancy is mainly due to the 
fact that galaxies with the highest peculiar velocities in 
a group are the most likely to be missed by the group 
finder. To demonstrate this, the lower panels of Fig. [12] 
show the corresponding results obtained from our mock 
group catalogue. In this case, the input velocity disper- 
sions for galaxies in haloes have a mean relation that is 
given by the solid line, and the halo masses Mh are the 
true halo masses. Here again, we see that the velocity 
dispersions among the selected group members are lower 
than that implied by the halo masses. 

5. SUMMARY 

In this paper, we have used a modified version of the 
halo-based group finding algorithm developed in Yang et 
al. (2005a) to construct group catalogues from the SDSS 
DR4. Changes and improvements in the group finder 
have been made in the following aspects: 

• In order to assign group memberships, we need to 
estimate masses for all tentative groups. Rather 
than using a model for the mass-to- light ratios, as 
we did previously, we now use self-consistent mass- 
to-light ratios obtained from the group catalogue 
in an iterative way. 

• In order to correct the characteristic luminosity 
and stellar mass for missing members due to the 
magnitude limit of the survey, we use the mean 
correction factors obtained self-consistently from 
groups at low redshifts. 

• We have corrected the survey edge effect on the 
groups by a correction factor. 

• In order to estimate group masses, we use two dif- 
ferent mass indicators, one based on the character- 
istic luminosity, iig.s, and the other on the char- 
acteristic stellar mass, Mstciiar- 

Tests based on detailed mock SDSS DR4 catalogues show 
that ~ 80% of all groups have a completeness > 80%. 
The fraction of groups with a completeness of 100% 
ranges from ~ 60% for the most massive groups to > 95% 
for groups with masses in the range 10^^'^ /i~^Mq < 
Mh < 1013 h-^Mg. On the order of 85% of aU groups 
have an interloper fraction < 50%, while ^ 65% of the 
groups have zero interlopers. 

We have applied our group finder to three galaxy sam- 
ples constructed from the SDSS DR4 galaxy catalogue: 
Sample I, which only contains galaxies with measured 
redshifts from the SDSS; Sample II, which also contains 
those SDSS galaxies for which redshifts are available from 
alternative sources (mainly from the 2dFGRS); and Sam- 
ple HI, which also includes galaxies which due to fiber 
collisions do not have a measured redshift, but which 
have been assigned the redshift of their nearest neigh- 
bor. We obtain a total of 295992, 301237 and 300049 
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Fig. 12. — The line-of-sight velocity dispersions of galaxies in groups, obtained using the gapper estimator of Beers et al. (1990), as 
function of halo mass. In the upper panels we show results for groups in our SDSS DR4 group catalogue as function of the assigned halo 
mass Mi^ . In the lower panels the results have been obtained from the group catalogue extracted from our MGRS and are shown as function 
of the true halo mass Mjj. Left- and right-hand panels show the results for groups with at least 3 and at least 8 members, respectively. 
Solid triangles with errorbars indicate the mean and the l-cr scatter in each mass bin, while the solid line reflects the theoretical expectation 
values based on equation l(6]l. As discussed in the text, the line-of-sight velocity dispersions of group members are biased low, due to the 
fact that galaxies with the highest peculiar velocities in a group are the most likely to be missed by the group finder. 



groups from Samples I, II and III, respectively, and each 
group is assigned two values for its halo mass based on 
the ranking of either the characteristic luminosity or the 
characteristic stellar mass of its member galaxies. 

In this paper we have presented some of the basic prop- 
erties of the group catalogue, such as the distributions of 
richness, redshift and mass. In addition we have pre- 
sented the average ratios between halo mass and char- 
acteristic luminosity and between halo mass and charac- 
teristic stellar mass. Although these are cosmology de- 
pendent, we have demonstrated that it is straightforward 
to convert these to other cosmologies. A more detailed 
analysis of the group properties and their implications for 
halo occupation statistics, galaxy formation and cosmol- 
ogy will be presented in a series of forthcoming papers. 
As a final note, we mention that all group catalogues pre- 
sented here are available from the authors upon request. 
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